ML System Monitoring and Continual learning

Causes of ML System Failures

Software system failures

ML-Specific Failures

Designing Machine Learning Systems

Edge cases vs. Outliers

  • Outliers refer to data: an example that differs significantly from other examples. Edge cases refer to performance: an example where a model performs significantly worse than other examples.
  • An outlier can cause a model to perform unusually poorly, which makes it an edge case. However, not all outliers are edge cases. For example, a person jaywalking on a highway is an outlier, but it’s not an edge case if your self-driving car can accurately detect that person and decide on a motion response appropriately.

Monitoring & Observability

Monitoring

= the act of tracking, measuring, and logging different metrics that can help us determine when something goes wrong

Metrics

Tips

Always think about how quickly do metrics change in your data!
For example, user data generally change slowly, but B2B data can change really fast!

Monitoring toolbox

Performance auditing

Observability

= setting up our system in a way that gives us visibility into our system to help us investigate what went wrong

Continual Learning

Types of model updates

Link

The two types can be also regarded as Model-centric vs. Data-centric AI Development

Stateful retraining vs. Stateless retraining

Pasted image 20230714115417.png|400

Four Stages of Continual Learning

  1. Stage 1 - Manual, stateless retraining
  2. Stage 2 - Automated retraining: needs infrastructure
  3. Stage 3 - Automated, stateful training: needs set fixed model updating schedule
  4. Stage 4 - Continual learning: model will be updated automatically with triggers:
    • time-based
    • performance-based
    • volume-based
    • drift-based

Test in Production

There are several techniques for evaluating the model in production

Blue/Green

Pasted image 20231004151507.png|500

Shadow deployment /Challenger (challenger model) Pasted image 20231004151652.png|500

  1. Deploy the candidate model in parallel with the existing model.
  2. For each incoming request, route it to both models to make predictions, but only serve the existing model’s prediction to the user.
  3. Log the predictions from the new model for analysis purposes
  4. Replace the existing model with the new model if the new model's predictions are satisfactory

A/B testing (AB testing)

Pasted image 20231004151346.png|500

  1. Deploy the candidate model alongside the existing model.
  2. A percentage of traffic is routed to the new model for predictions; the rest is routed to the existing model for predictions.
  3. Monitor and analyze the predictions and user feedback, and do stats test if any difference in prediction within long validation cycles

Canary release

Pasted image 20231004151716.png|500

  1. Deploy the candidate model alongside the existing model. The candidate model is called the canary.
  2. A portion of the traffic is routed to the candidate model.
  3. If its performance is satisfactory, increase the traffic to the candidate model. If not, abort the canary and route all the traffic back to the existing model.
  4. Stop when either the canary serves all the traffic (the candidate model has replaced the existing model) or when the canary is aborted.

Interleaving experiments

Multi-Armed Bandits Pasted image 20231004151936.png|600